The New tiCrypt Network Architecture Based on OpenVSwitch
Motivation
The "traditional" LibVirt networking is based on Linux bridges. This architecture is simple yet effective for providing networking connectivity to VMs. If the VMs run on a single server, this architecture is sufficient. However, if the VMs run on multiple servers, the Linux bridge architecture becomes more complex and less efficient. Specifically, in the case of tiCrypt, it creates the following issues:
- Host network isolation: The Linux bridge network is confined to the host it is defined on. The network can be extended using routing, but this creates significant complexity.
- IP management complexity: IP assignment becomes very difficult since each host must have its own IP range.
- Control of external access: tiCrypt needs to control external access to the VMs, and this is more difficult to achieve with Linux bridges since firewall rules must be defined on each host.
- External proxied access: tiCrypt needs to provide external proxied access to the VMs. This is accomplished by mapping port ranges on the host to the possible VM IPs (port 22). Such mapping rules pollute the firewall rules on the hosts.
- VM migration: The Linux bridge architecture does not support VM migration. This is a planned feature for tiCrypt.
- Proxy performance: The Linux bridge solution forces the use of "software proxying" for external access to VMs. This is much slower than a firewall-based solution that requires a unified network architecture across hosts.
- Rigid network integration: Libvirt, when using the Linux bridge architecture only supports a few (
nat,route, andopen) setups. This makes it difficult to deal with custom firewall rules on hosts and backend.
OpenVSwitch Solution
OpenVSwitch provides virtual switching capabilities, similar to "real" switches, but implemented in software. It is a high-performance solution integrated into the Linux kernel. OpenVSwitch supports advanced features such as VLANs, QoS, and network virtualization. It also provides a unified network architecture across hosts, using switching Layer 2 network extension.
The specific benefits for tiCrypt include:
- Unified network architecture: OpenVSwitch provides a unified network architecture across hosts, allowing for easier management and control of the network. Specifically, it allows all the VMs to be on the same network, regardless of which host they are running on. This allows VM-to-VM communication, a feature needed by Clustering and Batch Jobs (via Slurm).
- Unified IP address management: Since all the VMs are on the same Layer 2 network, IP address management is simplified. Specifically, by running a DHCP server on the tiCrypt backend, all the VM IPs are automatically managed centrally.
- Single exit point: The network and firewall rules can be configured to only allow the backend as an exit point thus enhancing security and auditing.
- Support for per-VM allow lists: This allows a more refined control of the external access for the VMs. Specifically, instead of allowing all the VMs to access an external server, only specific VMs can be allowed.
- Network isolation and control: By creating multiple virtual networks based on different OpenVSwitch bridges, tiCrypt can provide network isolation and control for different types of VMs. The secure VMs will use a different network than the data-in and service VMs. This is further enhanced by the use of VLANs, supported by OpenVSwitch, that extend the network isolation through the real switches.
- Improved performance: Extending the network all the way to the backend allows a firewall-based proxying solution that leverages the NFTables support in the Linux kernel. This is much faster than the software proxying solution required by the Linux bridge architecture.
- No LibVirt interference: When
openvswitchnetwork type is used in LibVirt, it only creates the OpenVSwitch port and does not interfere with the network configuration. This allows tiCrypt to have full control over the network configuration and firewall rules on the hosts and backend.
General Approach to Building Networks with OpenVSwitch in tiCrypt
The basic architecture we are interested in consists of:
- A backend server or Virtual Machine (VM) that runs the tiCrypt backend services and acts as a gateway for the VMs.
- Multiple hosts that run the VMs. The VM networking is provisioned on the hosts and must integrate with the backend network.
- The network must be isolated and have only the backend as an exit point.
The general approach to build such a network with OpenVSwitch is as follows (exemplified by the secure network):
- Assign a dedicated VLAN ID for the network (e.g.,
1081for thesecurenetwork).
- This VLAN ID must be assigned at the level of the organization (at least for complex setups that use multiple switches).
- The real switches that connect the backend and VM hosts must be configured to allow the VLAN traffic (e.g., by configuring the ports as trunk ports).
- Assign a dedicated IP range for the network (e.g.,
192.168.128.0/17for thesecurenetwork).
- This IP range is private and only used by the private network. It is not routable on the internet and it is only used for communication between the VMs and the backend. It does not need to be unique across different networks since the networks are isolated but it must not overlap with the IP range of the host network or the backend network.
- Create an OpenVSwitch bridge on each host (e.g.,
br-secure). This bridge will be used to connect the VMs to the network and it will extend the network to all the hosts and the backend.
- The bridge must be configured to use the VLAN ID assigned for the network (e.g., by adding a VLAN interface to the bridge).
- The bridge must be defined on a network (virtual or real) interface that is connected to the correct switch and VLAN. This is best accomplished by creating virtual network interfaces with VLAN IDs on a common physical interface such as
bond0.
- Create a network interface on the backend and connect it to the same VLAN (e.g., by creating a VLAN interface on the backend and connecting it to the real switch).
- Configure the backend to act as a gateway for the network. This involves:
- Assigning the first IP address in the network range to the backend (e.g.,
192.168.128.1for thesecurenetwork). - Running a DHCP server on the backend to assign IP addresses to the VMs. The DHCP server must be configured to assign IP addresses from the network range and to set the backend IP as the default gateway for the VMs. The preferred way is using
dnsmasqsince it is lightweight and allows both DHCP and DNS services. DNS control is very important for the secure network. - Configuring firewall rules on the backend to control the external access (based on network type).
The network architecture is flat based on Layer 2 switching. Moreover, ARP/RARP is used to "discover" the correct IPs and send the traffic to the correct destination.
Each network type will need a full and independent setup as described above. There is no intersection point between the different network types since they are isolated. The only common point is the backend which will have multiple interfaces, one for each network type.
No routing rules should be used with this architecture since it is based on Layer 2 switching. All the traffic must be switched at Layer 2 and the backend must be the only exit point for the VMs. Masquerading rules will be used to allow external traffic (as required by the network type). These networks must be completely isolated from any other networks.
Specific Implementation in tiCrypt
- Secure VMs
- Service VMs
- Data-Get VMs
Secure VMs are the primary workload in tiCrypt. They operate on a fully isolated network with strict firewall rules, controlled DNS, and per-VM allow lists governing external access. All data stored on their drives is encrypted, and network traffic is tightly audited through the backend gateway.
Service VMs boot a VM image on a network with general internet access, allowing administrators to quickly install packages, apply updates, and make changes to the underlying boot image without rebuilding it from scratch. They operate on a separate, unrestricted network since their purpose is image preparation rather than secure workloads.
Data-Get VMs are purpose-built VMs launched with newly formatted drives and general internet access. Their sole function is to pull data from outside sources onto those drives, which can then be detached and attached to Secure VMs for use in protected research environments. Like Service VMs, they operate on an unrestricted network since they do not handle sensitive data directly.
The following diagram illustrates the network architecture of tiCrypt based on OpenVSwitch (all IP ranges and VLAN IDs are examples):
Some specifics are:
- Three separate networks are created using independent components:
- secure: This network is used for the secure VMs (both interactive and batch).
- OpenVSwitch bridge
br-secureon interfaceenp1/bond0.1081 - VLAN ID
1081 - IP range
192.168.128.0/17 - Gateway(backend IP):
192.168.128.1 - DHCP range:
192.168.129.1:192.168.255.254 - Highly controlled masquerading rules (based on
ticrypt-nftandticrypt-firewallservices running on the backend) to allow only specific external access based on the VM allow list. - DNS strictly controlled to only servers in the allowed list (and tiCrypt backend).
- OpenVSwitch bridge
- service: This network is used for the service VMs.
- OpenVSwitch bridge
br-serviceon interfaceenp2/bond0.1082 - VLAN ID
1082 - IP range
192.168.122.0/24 - Gateway(backend IP):
192.168.122.1 - DHCP range:
192.168.122.3:192.168.122.254 - Masquerading rules to allow all external access since these VMs are not considered secure. DNS is not controlled since these VMs are not considered secure.
- OpenVSwitch bridge
- datain: This network is used for the data-in VMs.
- OpenVSwitch bridge
br-datainon interfaceenp3/bond0.1083 - VLAN ID
1083 - IP range
192.168.123.0/24 - Gateway(backend IP):
192.168.123.1 - DHCP range:
192.168.123.3:192.168.123.254 - Masquerading rules and DNS control similar to the
servicenetwork since these VMs are not considered secure.
- OpenVSwitch bridge
- secure: This network is used for the secure VMs (both interactive and batch).
- All the external traffic from VMs is routed (via gateway definition) to the backend.
- This allows strict control of the VM interaction with the external entities.
- This also allows the VM hosts to be isolated from general external access since VM traffic is not exiting directly from the hosts but is routed through the backend.
- The only entry point for accessing VMs is the backend.
- This hides the VMs in a private network and it allows full control of how the VMs can be reached. For the secure network, this allows the use of the Linux firewall based proxying that is much faster than the software proxying required by the Linux bridge architecture.
- Attack surface is reduced and logging is simplified.
- The entire network setup can be controlled via firewall rules on the backend alone.
- This massively simplifies management and reduces interference with other firewall rules deployed on the backend or VM hosts.
- All the tiCrypt related firewall rules can be confined to the
ticryptNFT table that can independently be managed by theticrypt-firewallservice running on the backend.
It is possible to define 3 independent networks with 3 different switches and avoid use of VLANs. This is usually wasteful. The preferred solution is to use a high-performance network bond0 and create virtual interfaces with VLAN IDs on top of it. This allows the use of a single physical interface for all the networks while still providing isolation and control.
The interface bond0 can be shared with other networks that are required, for example, for management or storage. The only requirement is that the real switch ports connected to bond0 must be configured to allow the VLAN traffic for the tiCrypt networks. The network isolation provided by the VLANs through virtual interfaces, OpenVSWitch bridges and real switches is sufficient to allow the coexistence of the tiCrypt networks with other networks on the same physical interface.
Puppet and NetworkManager Integration Considerations
Since the OpenVSWitch-based solution uses a separate NFT table (ticrypt) and separate virtual network interfaces (e.g. ens1), Puppet should be configured to "ignore" both the ticrypt NFT table and the virtual interfaces.
For VM hosts, the virtual network interfaces should be created but not assigned any IPs. This allows the OpenVSwitch bridges to use these interfaces without interference from Puppet. For the backend, the virtual network interfaces should be created and assigned the gateway IPs for each network (e.g., 192.168.128.1 for secure network). This allows the backend to act as a gateway for the VMs while still allowing Puppet to manage the IP configuration of the backend.
It is best to let ticrypt-setup, the Ansible-based tiCrypt setup tool, configure both the backend and VM hosts. The number and complexity of tasks is significant; any part missing or misconfigured can result in an unusable system.
Conclusion and Future Work
The OpenVSwitch-based network architecture, already deployed in several production systems, provides a much more controlled and high-performance solution to networking in tiCrypt. Past the initial setup cost (re-configuring the network and tiCrypt services), the new architecture provides a much more flexible and scalable solution for the tiCrypt network. Going forward, the OpenVSwitch-based architecture will be the only supported network architecture for tiCrypt and the Linux bridge-based architecture will be deprecated and eventually removed.
The use of OpenVSwitch opens up new possibilities for the tiCrypt network architecture. Some of the future work includes:
- VM migration: The unified network architecture provided by OpenVSwitch allows for VM migration between hosts without any network reconfiguration. This is a planned feature for tiCrypt and it will be implemented using the live migration capabilities of LibVirt and OpenVSwitch.
- Network monitoring and management: OpenVSwitch provides a rich set of tools for monitoring and managing the network. This can be used to provide better visibility into the network traffic and to troubleshoot network issues.
- Stricter network isolation: OpenVSwitch supports OpenFlow and other advanced features that can be used to provide stricter network isolation and control. This can be used to further enhance the security of the secure network. Specifically, OpenFlow rules can be used in the future to limit the VM-to-VM and VM-to-backend communication in the secure network to only what is required for the specific use case.
